Skip to main content

All Questions

2votes
1answer
1kviews

Factors that affect the number of iterations of value iteration

I had an assumption that value iteration will take more iterations to converge if the map size increases/environment's complexity increases. I tried to verify this idea by running value iteration on ...
john li's user avatar
0votes
1answer
125views

What reinforcement learning algorithm should I use in continuous states?

I want to use reinforcement learning in an environment I made. The exact environment doesn't really matter, but it comes down to this: The amount of different states in the environment is infinite e.g....
SirPVP's user avatar
1vote
1answer
136views

In RL, if I assign the rewards for better positional play, the algorithm is learning nothing?

I'm creating an RL application for the game Connect Four. If I tell the algorithm which moves/token positions will receive greater rewards, surely it's not actually learning anything; it's just a ...
mason7663's user avatar
1vote
0answers
66views

Action spaces for an RTS game

I think reinforcement learning would be a good fit for this problem, but I am not sure of how to deal with a seemingly infinite number of actions. In the beginning of each game (generic RTS game), the ...
Quaxton Hale's user avatar
3votes
1answer
9kviews

What is the time complexity of the value iteration algorithm?

Recently, I have come across the information (lecture 8 and 9 about MDPs of this UC Berkeley AI course) that the time complexity for each iteration of the value iteration algorithm is $\mathcal{O}(|S|^...
Shifat E Arman's user avatar
18votes
4answers
3kviews

Why does the discount rate in the REINFORCE algorithm appear twice?

I was reading the book Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto (complete draft, November 5, 2017). On page 271, the pseudo-code for the episodic Monte-Carlo ...
Diego Orellana's user avatar
2votes
0answers
94views

Which features and algorithm could optimize this air-conditioner problem?

Imagine we have 2 air conditioner systems (AA) and 2 "free cooling" systems which mix external and internal air (FC) in a closed box which always tends to warm up. For each system, we have to find ...
freesoul's user avatar
2votes
1answer
720views

What algorithm should I use to classify documents?

I'd like to build a program that would learn to automatically classify documents. The principle would be that, for each new document I add to the system, it would automatically infer in which category ...
Charles Brunet's user avatar

close